On the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of (s, S) Inventory Policies
نویسندگان
چکیده
This paper studies convergence properties of optimal values and actions for discounted and averagecost Markov Decision Processes (MDPs) with weakly continuous transition probabilities and applies these properties to the stochastic periodic-review inventory control problem with backorders, positive setup costs, and convex holding/backordering costs. The following results are established for MDPs with possibly noncompact action sets and unbounded cost functions: (i) convergence of value iterations to optimal values for discounted problems with possibly non-zero terminal costs, (ii) convergence of optimal finite-horizon actions to optimal infinite-horizon actions for total discounted costs, as the time horizon tends to infinity, and (iii) convergence of optimal discount-cost actions to optimal average-cost actions for infinite-horizon problems, as the discount factor tends to 1. Being applied to the setup-cost inventory control problem, the general results on MDPs imply the optimality of (s, S) policies and convergence properties of optimal thresholds. In particular this paper analyzes the setup-cost inventory control problem without two assumptions often used in the literature: (a) the demand is either discrete or continuous or (b) the backordering cost is higher than the cost of backordered inventory if the amount of backordered inventory is large.
منابع مشابه
Optimality Inequalities for Average Cost Markov Decision Processes and the Optimality of (s, S) Policies
For general state and action space Markov decision processes, we present sufficient conditions for convergence of both the optimal discounted cost value function and policies to the corresponding objects for the average costs per unit time. We extend Schäl’s [24] assumptions, guaranteeing the existence of a solution to the average cost optimality inequalities for compact action sets, to non-com...
متن کاملOn the optimality equation for average cost Markov decision processes and its validity for inventory control
As is well known, average-cost optimality inequalities imply the existence of stationary optimal policies for Markov decision processes with average costs per unit time, and these inequalities hold under broad natural conditions. This paper provides sufficient conditions for the validity of the average-cost optimality equation for an infinite state problem with weakly continuous transition prob...
متن کاملUtilizing Generalized Learning Automata for Finding Optimal Policies in MMDPs
Multi agent Markov decision processes (MMDPs), as the generalization of Markov decision processes to the multi agent case, have long been used for modeling multi agent system and are used as a suitable framework for Multi agent Reinforcement Learning. In this paper, a generalized learning automata based algorithm for finding optimal policies in MMDP is proposed. In the proposed algorithm, MMDP ...
متن کاملOptimality Inequalities for Average Cost Markov Decision Processes and the Stochastic Cash Balance Problem
For general state and action space Markov decision processes, we present sufficient conditions for the existence of solutions of the average cost optimality inequalities. These conditions also imply the convergence of both the optimal discounted cost value function and policies to the corresponding objects for the average costs per unit time case. Inventory models are natural applications of ou...
متن کاملOne-for-One Period Policy and its Optimal Solution
In this paper we introduce the optimal solution for a simple and yet practical inventory policy with the important characteristic which eliminates the uncertainty in demand for suppliers. In this new policy which is different from the classical inventory policies, the time interval between any two consecutive orders is fixed and the quantity of each order is one. Assuming the fixed ordering cos...
متن کامل